Goto

Collaborating Authors

 lopez-paz & ranzato


Reviews: Superposition of many models into one

Neural Information Processing Systems

I appreciate that the authors re-implemented the CIFAR benchmark I had requested. However, I'm still unconvinced of the significance or the originality of the proposed approach. For me, two fundamental issues remain: 1) The proposed approach is conceptually very similar to the masking proposed in Masse et al. (2018) that I mentioned in my review. The only difference is essentially masking with a {1,-1} vector vs. masking with a {0,1} vector. For sufficiently sparse masks (as used in Masse et al.), the latter approach will also produce largely non-overlapping feature subsets for different tasks, so I don't see this as a huge difference.


Continual Prototype Evolution: Learning Online from Non-Stationary Data Streams

De Lange, Matthias, Tuytelaars, Tinne

arXiv.org Artificial Intelligence

Attaining prototypical features to represent class distributions is well established in representation learning. However, learning prototypes online from streams of data proves a challenging endeavor as they rapidly become outdated, caused by an ever-changing parameter space in the learning process. Additionally, continual learning does not assume the data stream to be stationary, typically resulting in catastrophic forgetting of previous knowledge. As a first, we introduce a system addressing both problems, where prototypes evolve continually in a shared latent space, enabling learning and prediction at any point in time. In contrast to the major body of work in continual learning, data streams are processed in an online fashion, without additional task-information, and an efficient memory scheme provides robustness to imbalanced data streams. Besides nearest neighbor based prediction, learning is facilitated by a novel objective function, encouraging cluster density about the class prototype and increased inter-class variance. Furthermore, the latent space quality is elevated by pseudo-prototypes in each batch, constituted by replay of exemplars from memory. We generalize the existing paradigms in continual learning to incorporate data incremental learning from data streams by formalizing a two-agent learner-evaluator framework, and obtain state-of-the-art performance by a significant margin on eight benchmarks, including three highly imbalanced data streams. The prevalence of data streams in contemporary applications urges systems to learn in a continual fashion.


Bilevel Continual Learning

Pham, Quang, Sahoo, Doyen, Liu, Chenghao, Hoi, Steven C. H

arXiv.org Machine Learning

Continual learning aims to learn continuously from a stream of tasks and data in an online-learning fashion, being capable of exploiting what was learned previously to improve current and future tasks while still being able to perform well on the previous tasks. One common limitation of many existing continual learning methods is that they often train a model directly on all available training data without validation due to the nature of continual learning, thus suffering poor generalization at test time. In this work, we present a novel framework of continual learning named "Bilevel Continual Learning" (BCL) by unifying a bilevel optimization objective and a dual memory management strategy comprising both episodic memory and generalization memory to achieve effective knowledge transfer to future tasks and alleviate catastrophic forgetting on old tasks simultaneously. Our extensive experiments on continual learning benchmarks demonstrate the efficacy of the proposed BCL compared to many state-of-the-art methods. Unlike humans, conventional machine learning methods, particularly neural networks, struggle to learn continuously because these models lose their abilities to perform acquired skills when they learn a new task (French, 1999). Continual learning systems are specifically designed to learn continuously from a stream of tasks. They are able to accumulate knowledge over time to improve the future learning outcome, while still being able to perform well on the previous tasks.


Efficient Lifelong Learning with A-GEM

Chaudhry, Arslan, Ranzato, Marc'Aurelio, Rohrbach, Marcus, Elhoseiny, Mohamed

arXiv.org Machine Learning

In lifelong learning, the learner is presented with a sequence of tasks, incrementally building a data-driven prior which may be leveraged to speed up learning of a new task. In this work, we investigate the efficiency of current lifelong approaches, in terms of sample complexity, computational and memory cost. Towards this end, we first introduce a new and a more realistic evaluation protocol, whereby learners observe each example only once and hyper-parameter selection is done on a small and disjoint set of tasks, which is not used for the actual learning experience and evaluation. Second, we introduce a new metric measuring how quickly a learner acquires a new skill. Third, we propose an improved version of GEM (Lopez-Paz & Ranzato, 2017), dubbed Averaged GEM (A-GEM), which enjoys the same or even better performance as GEM, while being almost as computationally and memory efficient as EWC (Kirkpatrick et al., 2016) and other regularization-based methods. Finally, we show that all algorithms including A-GEM can learn even more quickly if they are provided with task descriptors specifying the classification tasks under consideration. Our experiments on several standard lifelong learning benchmarks demonstrate that A-GEM has the best trade-off between accuracy and efficiency.


Learning to Learn without Forgetting By Maximizing Transfer and Minimizing Interference

Riemer, Matthew, Cases, Ignacio, Ajemian, Robert, Liu, Miao, Rish, Irina, Tu, Yuhai, Tesauro, Gerald

arXiv.org Artificial Intelligence

Lack of performance when it comes to continual learning over non-stationary distributions of data remains a major challenge in scaling neural network learning to more human realistic settings. In this work we propose a new conceptualization of the continual learning problem in terms of a temporally symmetric trade-off between transfer and interference that can be optimized by enforcing gradient alignment across examples. We then propose a new algorithm, Meta-Experience Replay (MER), that directly exploits this view by combining experience replay with optimization based meta-learning. This method learns parameters that make interference based on future gradients less likely and transfer based on future gradients more likely. We conduct experiments across continual lifelong supervised learning benchmarks and non-stationary reinforcement learning environments demonstrating that our approach consistently outperforms recently proposed baselines for continual learning. Our experiments show that the gap between the performance of MER and baseline algorithms grows both as the environment gets more non-stationary and as the fraction of the total experiences stored gets smaller.